# Synthesis Of VHDL Code

RTL Hardware Design Chapter 6 1

## **Outline**

- 1. Fundamental limitation of EDA software
- 2. Realization of VHDL operator
- 3. Realization of VHDL data type
- 4. VHDL synthesis flow
- 5. Timing consideration

#### 1. Fundamental limitation of EDA software

- What is the synthesis of VHDL code
  - the process of realizing the VHDL description using primitive logic cells from the target device's library.
- Can "C-to-hardware" be done?
- EDA tools:
  - Core: the algorithm that perform the transformation or optimization
  - Shell: wrapping the algorithm
- What does theoretical computer science say?
  - Computability
  - Computation complexity

RTL Hardware Design Chapter 6

3

# Computability

- A problem is computable if an algorithm exists.
- E.g., "halting problem":
  - can we develop a program that takes any program and its input, and determines whether the computation of that program will eventually halt?
- any attempt to examine the "meaning" of a program is uncomputable

## Computation complexity

- How fast an algorithm can run (or how good an algorithm is)? – time complexity
  - "Interferences" in measuring execution time: types of CPU, speed of CPU, compiler etc.
- How much hardware resources are used space complexity

RTL Hardware Design

Chapter 6

5

## Big-O notation

- f(n) is O(g(n)): if  $n_0$  and c can be found to satisfy: f(n) < cg(n) for any  $n, n > n_0$
- g(n) is simple function:  $1, n, \log_2 n, n^2, n^3, 2^n$
- Following are  $O(n^2)$ :
  - $0.1n^2$
  - $n^2 + 5n + 9$
  - $500n^2 + 1000000$

## Interpretation of Big-O

- Filter out the "interference": constants and less important terms
- n is the input size of an algorithm
- The "scaling factor" of an algorithm:
   What happens if the input size increases

RTL Hardware Design Chapter 6 7

E.g.,

| input size | Big-O function      |                     |                      |                      |                      |                      |  |  |
|------------|---------------------|---------------------|----------------------|----------------------|----------------------|----------------------|--|--|
|            | n                   | $\log_2 n$          | $n \log_2 n$         | $n^2$                | $n^3$                | $2^n$                |  |  |
| 2          | $2 \mu s$           | $1  \mu \mathrm{s}$ | $2~\mu \mathrm{s}$   | $4  \mu \mathrm{s}$  | $8  \mu \mathrm{s}$  | $4 \mu s$            |  |  |
| 4          | $4  \mu \mathrm{s}$ | $2~\mu \mathrm{s}$  | $8  \mu \mathrm{s}$  | $16  \mu \mathrm{s}$ | $64~\mu \mathrm{s}$  | $16 \mu\mathrm{s}$   |  |  |
| 8          | $8  \mu \mathrm{s}$ | $3 \mu s$           | $24~\mu \mathrm{s}$  | $64~\mu \mathrm{s}$  | $512~\mu \mathrm{s}$ | $256~\mu \mathrm{s}$ |  |  |
| 16         | $16 \mu s$          | $4~\mu \mathrm{s}$  | $64~\mu \mathrm{s}$  | $256~\mu \mathrm{s}$ | 4 ms                 | 66 ms                |  |  |
| 32         | $32~\mu s$          | $5~\mu \mathrm{s}$  | $160~\mu \mathrm{s}$ | 1 ms                 | 33 ms                | 71 min               |  |  |
| 48         | $48~\mu s$          | $5.5~\mu\mathrm{s}$ | $268~\mu \mathrm{s}$ | $2~m\mathrm{s}$      | 111 ms               | 9 year               |  |  |
| 64         | $64~\mu \mathrm{s}$ | $6  \mu \mathrm{s}$ | $384~\mu \mathrm{s}$ | 4 ms                 | $262~m\mathrm{s}$    | 600,000 year         |  |  |

- Intractable problems:
  - algorithms with  $O(2^n)$
  - Not realistic for a larger n
  - Frequently tractable algorithms for sub-optimal solution exist
- Many problems encountered in synthesis are intractable

RTL Hardware Design Chapter 6 9

#### Theoretical limitation

- Synthesis software does not know your intention
- Synthesis software cannot obtain the optimal solution
- Synthesis should be treated as transformation and a "local search" in the "design space"
- Good VHDL code provides a good starting point for the local search
- The quality and efficiency of a design still rely on a human designer's experience, insight, ingenuity and imagination
  - the ultimate heuristics that cannot be coded into software

RTL Hardware Design Chapter 6 10

# 2. Realization of VHDL operator

- Logic operator
  - Simple, direct mapping
- Relational operator
  - =, /= fast, simple implementation exists
  - ->, < etc: more complex implementation, larger delay
- Addition operator
- Other arithematic operators: support varies

RTL Hardware Design

Chapter 6

11

- Operator with two constant operands:
  - Simplified in preprocessing
  - No hardware inferred
  - Good for documentation
  - E.g.,

```
constant OFFSET: integer := 8;
signal boundary: unsigned(8 downto 0);
signal overflow: std_logic;
. . .
overflow <= '1' when boundary > (2**OFFSET-1) else
    '0':
```

- Operator with one constant operand:
  - Can significantly reduce the hardware complexity
  - E.g., adder vs. incrementor
  - -E.g
    - y <= rotate\_right(x, z); -- barrel shifter
    - y <= rotate\_right(x, 3); -- rewiring
    - $y \le x(2 \text{ downto } 0) \& x(7 \text{ downto } 3);$
  - E.g., 4-bit comparator: x=y vs. x=0

$$(x_3 \oplus y_3)' \cdot (x_2 \oplus y_2)' \cdot (x_1 \oplus y_1)' \cdot (x_0 \oplus y_0)'$$
  
 $x_3' \cdot x_2' \cdot x_1' \cdot x_0'$ 

RTL Hardware Design

Chapter 6

13

# An example 0.55 um standard-cell CMOS implementation

| width |      |     |       | VE    | IDL op  | perator | 1      |       |       |     |
|-------|------|-----|-------|-------|---------|---------|--------|-------|-------|-----|
|       | nand | xor | $>_a$ | $>_d$ | =       | $+1_a$  | $+1_d$ | $+_a$ | $+_d$ | mux |
|       |      |     |       | are   | a (gate | count)  | )      |       |       |     |
| 8     | 8    | 22  | 25    | 68    | 26      | 27      | 33     | 51    | 118   | 21  |
| 16    | 16   | 44  | 52    | 102   | 51      | 55      | 73     | 101   | 265   | 42  |
| 32    | 32   | 85  | 105   | 211   | 102     | 113     | 153    | 203   | 437   | 85  |
| 64    | 64   | 171 | 212   | 398   | 204     | 227     | 313    | 405   | 755   | 171 |
|       |      |     |       |       | delay   | (ns)    |        |       |       |     |
| 8     | 0.1  | 0.4 | 4.0   | 1.9   | 1.0     | 2.4     | 1.5    | 4.2   | 3.2   | 0.3 |
| 16    | 0.1  | 0.4 | 8.6   | 3.7   | 1.7     | 5.5     | 3.3    | 8.2   | 5.5   | 0.3 |
| 32    | 0.1  | 0.4 | 17.6  | 6.7   | 1.8     | 11.6    | 7.5    | 16.2  | 11.1  | 0.3 |
| 64    | 0.1  | 0.4 | 35.7  | 14.3  | 2.2     | 24.0    | 15.7   | 32.2  | 22.9  | 0.3 |

## 3. Realization of VHDL data type

- Use and synthesis of 'Z'
- Use of '-'

RTL Hardware Design Chapter 6 15

# Use and synthesis of 'Z'

- Tri-state buffer:
  - Output with "high-impedance"
  - Not a value in Boolean algebra
  - Need special output circuitry (tri-state buffer)



| oe | y    |
|----|------|
| 0  | Z    |
| 1  | a_in |

- Major application:
  - Bi-directional I/O pins
  - Tri-state bus
- VHDL description:

```
y <= 'Z' when oe='1' else
a_in;
```



· 'Z' cannot be used as input or manipulated



RTL Hardware Design

Chapter 6

17

Separate tri-state buffer from regular code:

#### Less clear:

```
with sel select
```

```
y <= 'Z' when "00",
'1' when "01"|"11",
'0' when others:
```

#### – better:

with sel select



# Bi-directional i/o pins dir -የግን sig\_out sig\_in entity bi\_demo is port(bi: inout std\_logic; begin sig\_out <= output\_expression;</pre> . . . <= expression\_with\_sig\_in; bi <= sig\_out when dir='1' else 'Z';</pre> sig\_in <= bi; RTL Hardware Design dir sig\_out



## Tri-state bus



RTL Hardware Design

Chapter 6

21

- Problem with tri-state bus
  - Difficult to optimize, verify and test
  - Technology dependent, and hence less portable.
- Alternative to tri-state bus: mux

RTL Hardware Design

Chapter 6

23

#### Use of '-'

- In conventional logic design
  - '-' as input value: shorthand to make table compact
  - E.g.,

| input       | output |
|-------------|--------|
| req         | code   |
| 100         | 10     |
| 101         | 10     |
| 1 1 0       | 10     |
| 1 1 1       | 10     |
| 010         | 01     |
| 0 1 1       | 01     |
| 0 0 1       | 00     |
| $0 \ 0 \ 0$ | 00     |

| input | output |
|-------|--------|
| req   | code   |
| 1     | 10     |
| 01-   | 01     |
| 0 0 1 | 00     |
| 0 0 0 | 00     |

- '-' as output value: help simplification
- E.g.,
  - '-' assigned to 1: a + b
  - '-' assigned to 0: a'b + ab'

| input | output |
|-------|--------|
| a b   | f      |
| 0 0   | 0      |
| 0 1   | 1      |
| 10    | 1      |
| 1 1   | _      |

RTL Hardware Design Chapter 6 25

## Use '-' in VHDL

- As input value (against our intuition):
- Wrong:

```
y <= "10" when req="1--" else
"01" when req="01-" else
"00" when req="001" else
"00":
```

#### • Fix #1:

```
y <= "10" when req(3)='1' else
    "01" when req(3 downto 2)="01" else
    "00" when req(3 downto 1)="001" else
    "00";</pre>
```

#### • Fix #2:

RTL Hardware Design

Chapter 6

27

### Wrong:

#### • Fix:

- '-' as an output value in VHDL
- May work with some software

RTL Hardware Design Chapter 6 29

# 4. VHDL Synthesis Flow

- Synthesis:
  - Realize VHDL code using logic cells from the device's library
  - a refinement process
- Main steps:
  - RT level synthesis
  - Logic synthesis
  - Technology mapping



# RT level synthesis

Ch

- Realize VHDL code using RT-level components
- Somewhat like the derivation of the conceptual diagram
- Limited optimization
- Generated netlist includes
  - "regular" logic: e.g., adder, comparator
  - "random" logic: e.g., truth table description



31

## Module generator

"regular" logic can be replaced by pre-designed

module

- Pre-designed module is more efficient
- Module can be generated in different levels of detail
- Reduce the processing time

RT level synthesis and optiomiztion

RT level netlist

module generator

module generator

pre-designed module library

gate level netlist

Technology mapping

cell library

Ch. cell level netlist

33

RTL Hardware Design

## Logic Synthesis

Realize the circuit with the optimal number of "generic"

Ch

gate level components

Process the "random" logic

- Two categories:
  - Two-level synthesis: sum-ofproduct format
  - Multi-level synthesis



## • E.g.,



RTL Hardware Design Chapter 6 35

## Technology mapping

- Map "generic" gates to "device-dependent" logic cells
- The technology library is provided by the vendors who manufactured (in FPGA) or will manufacture (in ASIC) the device



Ch

# E.g., mapping in standard-cell ASIC

Device library



RTL Hardware Design

37

#### • Cost: 31 vs. 17



(a) Initial mapping



RTL Hardware De

(b) Better mapping

## E.g., mapping in FPGA

With 5-input LUT (Look-Up-Table) cells



## Effective use of synthesis software

- Logic operators: software can do a good job
- Relational/Arith operators: manual intervention needed
- "layout" and "routing structure":
  - Silicon chip is 2-dimensional square
  - "rectangular" or "tree-shaped" circuit is easier to optimize

RTL Hardware Design Chapter 6 40



## 5. Combinational Circuit Timing consideration

- Propagation delay
- Synthesis with timing constraint
- Hazards
- Delay-sensitive design

## Propagation delay

- Delay: time required to propagate a signal from an input port to an output port
- · Cell level delay: most accurate
- · Simplified model:

$$delay = d_{intrinsic} + r * C_{load}$$

 The impact of wire becomes more dominant as the transistor becomes smaller



RTL Hardware Design

Chapter 6

## System delay

- · The longest path (critical path) in the system
- The worst input to output delay





## "False path" may exists:



RTL Hardware Design Chapter 6 45

- RT level delay estimation:
  - Difficult if the design is mainly "random" logic
  - Critical path can be identified if many complex operators (such adder) are used in the design.

## Synthesis with timing constraint

· Area-delay trade-off curve



## Synthesis with timing constraint

- Multi-level synthesis is flexible
- It is possible to reduce delay by adding extra logic



- Synthesis with timing constraint
  - 1. Obtain the minimal-area implementation
  - 2. Identify the critical path
  - 3. Reduce the delay by adding extra logic
  - 4. Repeat 2 & 3 until meeting the constraint

RTL Hardware Design Chapter 6 48

 Improvement in "architectural" level design (better VHDL code to start with)



Timing Hazards

- Propagation delay: time to obtain a stable output
- Hazards: the fluctuation occurring during the transient period
  - Static hazard: glitch when the signal should be stable
  - Dynamic hazard: a glitch in transition
- Due to the multiple converging paths of an output port

#### • E.g., static-hazard (sh=ab'+bc; a=c=1)



 $sh = a \cdot \overline{b} + b \cdot c$ (a) Karnaugh map and schematic



RTL H 51 (b) Timing diagram

#### • E.g., dynamic hazard (a=c=1, d=0)



52

## Dealing with hazards

· Some hazards can be eliminated in theory



(c) Revised Karnaugh map and schematic to eliminate hazards

RTL Hardware Design Chapter 6 53

- Eliminating glitches is very difficult in reality, and almost impossible for synthesis
- Multiple inputs can change simultaneously (e.g., 1111=>0000 in a counter)
- How to deal with it?
- Ignore glitches in the transient period and retrieve the data after the signal is stabilized
- "Wait until the output is stabilized" the motivation behind the synchronous design methodology.

54

## Delay sensitive design and its danger

- Boolean algebra
  - the theoretical model for digital design and most algorithms used in synthesis process
  - algebra deals with the stabilized signals
- Delay-sensitive design
  - Depend on the transient property (and delay) of the circuit
  - Difficult to design and analyze

RTL Hardware Design

Chapter 6

55

 E.g., hazard elimination circuit: ac term is not needed





E.g., edge detection circuit (pulse=a a')



a a\_not pulse

RTL Hardware Design Chapter 6 56

- · What's can go wrong:
  - E.g., pulse <= a and (not a);</p>
  - During logic synthesis, the logic expressions will be rearranged and optimized.
  - During technology mapping, generic gates will be remapped
  - During placement & routing, wire delays may change
  - It is bad for testing verification
- If delay-sensitive design is really needed, it should be done manually, not by synthesis

RTL Hardware Design Chapter 6 57

## 6. Sequential Circuit Timing analysis

- Combinational circuit:
  - characterized by propagation delay
- Sequential circuit:
  - Has to satisfy setup/hold time constraint
  - Characterized by maximal clock rate
     (e.g., 200 MHz counter, 2.4 GHz Pentium II)
  - Setup time and clock-to-q delay of register and the propagation delay of next-state logic are embedded in clock rate

- state next must satisfy the constraint
- Must consider effect of
  - state\_reg: can be controlled
  - synchronized external input (from a subsystem of same clock)
  - unsynchronized external input
- Approach
  - First 2: adjust clock rate to prevent violation
  - Last: use "synchronization circuit" to resolve violation





$$t_3 = t_0 + T_{cq} + T_{next(max)}$$

$$t_4 = t_5 - T_{setup} = t_0 + T_c - T_{setup}$$

$$t_3 < t_4$$

$$t_0 + T_{cq} + T_{next(max)} < t_0 + T_c - T_{setup}$$

$$T_{cq} + T_{next(max)} + T_{setup} < T_c$$

$$T_{c(min)} = T_{cq} + T_{next(max)} + T_{setup}$$

RTL Hardware Design by P. Chu

Chapter 8

61

#### • E.g., shift register; let $T_{cq} = 1.0$ ns $T_{setup} = 0.5$ ns



$$T_{c(min)}=T_{cq}+T_{setup}=1.5 \text{ ns}$$
 
$$f_{max}=\frac{1}{T_{cq}+T_{setup}}=\frac{1}{1.5 \text{ ns}}\approx 666.7 \text{ MHz}$$

RTL Hard by P. Chu

#### • E.g., Binary counter; let $T_{cq} = 1.0$ ns $T_{setup} = 0.5$ ns



| width |      |     |       | VE    | IDL op  | perator |        |       |       |     |  |  |  |
|-------|------|-----|-------|-------|---------|---------|--------|-------|-------|-----|--|--|--|
|       | nand | xor | $>_a$ | $>_d$ | =       | $+1_a$  | $+1_d$ | $+_a$ | $+_d$ | mux |  |  |  |
|       |      |     |       | are   | a (gate | count)  | )      |       |       |     |  |  |  |
| 8     | 8    | 22  | 25    | 68    | 26      | 27      | 33     | 51    | 118   | 21  |  |  |  |
| 16    | 16   | 44  | 52    | 102   | 51      | 55      | 73     | 101   | 265   | 42  |  |  |  |
| 32    | 32   | 85  | 105   | 211   | 102     | 113     | 153    | 203   | 437   | 85  |  |  |  |
| 64    | 64   | 171 | 212   | 398   | 204     | 227     | 313    | 405   | 755   | 171 |  |  |  |
|       |      |     |       |       | delay   | (ns)    |        |       |       |     |  |  |  |
| 8     | 0.1  | 0.4 | 4.0   | 1.9   | 1.0     | 2.4     | 1.5    | 4.2   | 3.2   | 0.3 |  |  |  |
| 16    | 0.1  | 0.4 | 8.6   | 3.7   | 1.7     | 5.5     | 3.3    | 8.2   | 5.5   | 0.3 |  |  |  |
| 32    | 0.1  | 0.4 | 17.6  | 6.7   | 1.8     | 11.6    | 7.5    | 16.2  | 11.1  | 0.3 |  |  |  |
| 64    | 0.1  | 0.4 | 35.7  | 14.3  | 2.2     | 24.0    | 15.7   | 32.2  | 22.9  | 0.3 |  |  |  |

RTL Hardware Design by P. Chu

Chapter 8

63

$$\begin{split} f_{max} &= \frac{1}{T_{cq} + T_{8\_bit\_inc(area)} + T_{setup}} = \frac{1}{1 \text{ ns} + 2.4 \text{ ns} + 0.5 \text{ ns}} \approx 256.4 \text{ MHz} \\ f_{max} &= \frac{1}{T_{cq} + T_{16\_bit\_inc(area)} + T_{setup}} = \frac{1}{1 \text{ ns} + 5.5 \text{ ns} + 0.5 \text{ ns}} \approx 142.9 \text{ MHz} \\ f_{max} &= \frac{1}{T_{cq} + T_{32\_bit\_inc(area)} + T_{setup}} = \frac{1}{1 \text{ ns} + 11.6 \text{ ns} + 0.5 \text{ ns}} \approx 76.3 \text{ MHz} \end{split}$$

$$f_{max} = \frac{1}{T_{cq} + T_{8\_bit\_inc(delay)} + T_{setup}} = \frac{1}{1~\text{ns} + 1.5~\text{ns} + 0.5~\text{ns}} \approx 333.3~\text{MHz}$$
 
$$f_{max} = \frac{1}{T_{cq} + T_{16\_bit\_inc(delay)} + T_{setup}} = \frac{1}{1~\text{ns} + 3.3~\text{ns} + 0.5~\text{ns}} \approx 208.3~\text{MHz}$$
 and

$$f_{max} = \frac{1}{T_{cq} + T_{32\_bit\_inc(delay)} + T_{setup}} = \frac{1}{1 \text{ ns} + 7.5 \text{ ns} + 0.5 \text{ ns}} \approx 111.1 \text{ MHz}$$

RTL Hardware Design by P. Chu

Chapter 8

64



$$t_2 = t_0 + T_{cq} + T_{next(min)}$$

$$t_h = t_0 + T_{hold}$$

$$t_h < t_2$$

$$T_{hold} < T_{cq} + T_{next(min)}$$

$$T_{hold} < T_{cq}$$

# Output delay



$$T_{co} = T_{cq} + T_{output}$$

RTL Hardware Design by P. Chu

Chapter 8

67

# 7. Poor design practice and remedy

- Synchronous design is the most important methodology
- Poor practice in the past (to save chips)
  - Misuse of asynchronous reset
  - Misuse of gated clock
  - Misuse of derived clock

# Misuse of asynchronous reset

- Poor design: use reset to clear register in normal operation.
- e.g., a poorly mod-10 counter
  - Clear register immediately after the counter reaches 1010

```
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity mod10_counter is
    port(
        clk, reset: in std_logic;
        q: out std_logic_vector(3 downto 0)
        );
end mod10_counter;
RTL Hardware Design Chapter 9 69
by P. Chu
```



Figure 9.1 Decade counter using asynchronous reset

```
architecture poor_async_arch of mod10_counter is
       signal r_reg: unsigned(3 downto 0);
       signal r_next: unsigned(3 downto 0);
       signal async_clr: std_logic;
   begin
                                                    r_next
      -- register
      process (clk, async_clr)
      begin
                                               =10
          if (async_clr='1') then
             r_reg \ll (others => '0'); reset
          elsif (clk'event and clk='1') then
             r_reg <= r_next;
          end if;
      end process;
      -- asynchronous clear
      async_clr <= '1' when (reset='1' or r_reg="1010") else
                     0';
      -- next state logic
      r_next <= r_reg + 1;
      -- output logic
      q <= std_logic_vector(r_reg);</pre>
   end poor_async_arch;
RTL Hardware Design
                              Chapter 9
                                                                71
by P. Chu
```

#### Problem

- Glitches in transition 1001 (9) => 0000 (0)
- Glitches in aync\_clr can reset the counter
- How about timing analysis? (maximal clock rate)
- Asynchronous reset should only be used for power-on initialization



Chapter 9

73

### Remedy: load "0000" synchronously

```
architecture two_seg_arch of mod10_counter is
   signal r_reg: unsigned(3 downto 0);
   signal r_next: unsigned(3 downto 0);
begin
  -- register
   process (clk,reset)
   begin
      if (reset='1') then
         r_reg \ll (others => ,0); reset-
      elsif (clk'event and clk='1') then
         r_reg <= r_next;
      end if;
   end process;
   -- next state logic
   r_next <= (others=>'0') when r_reg=9 else
              r_reg + 1;
   -- output logic
   q <= std_logic_vector(r_reg);</pre>
end two_seg_arch;
```

RTL Hardware Design by P. Chu

Chapter 9

## Misuse of gated clock

- Poor design: use a and gate to disable the clock to stop the register to get new value
- E.g., a counter with an enable signal

```
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity binary_counter is
   port (
      clk, reset: in std_logic;
      en: in std logic;
      q: out std_logic_vector(3 downto 0)
      );
end binary_counter;
```

RTL Hardware Design by P. Chu

Chapter 9



Figure 9.2 Disabling FF with gated clock

```
architecture gated_clk_arch of binary_counter is
   signal r_reg: unsigned(3 downto 0);
   signal r_next: unsigned(3 downto 0);
   signal gated_clk: std_logic;
begin
   -- register
   process (gated_clk,reset)
   begin
      if (reset='1') then
         r_reg <= (others=>'0');
      elsif (gated_clk'event and gated_clk = '1') then
         r_reg <= r_next;
      end if;
   end process;
   — gated clock
                                           r_next
                                                     r_req
   gated_clk <= clk and en;
   -- next state logic
   r_next <= r_reg + 1;
   — output logic
   q <= std_logic_vector(r_reg);</pre>
end gated_clk_arch;
```

Chapter 9

77

#### Problem

- Gated clock width can be narrow
- Gated clock may pass glitches of en
- Difficult to design the clock distribution network

Remedy: use a synchronous enable

```
architecture two_seg_arch of binary_counter is
         signal r_reg: unsigned(3 downto 0);
         signal r_next: unsigned(3 downto 0);
     begin
        -- register
         process (clk, reset)
         begin
            if (reset='1') then
               r_reg <= (others=>'0');
            elsif (clk'event and clk='1') then
               r_reg <= r_next;
            end if;
         end process;
        -- next state logic
         r_next <= r_reg + 1 when en='1' else
                    r_reg;
        — output logic
         q <= std_logic_vector(r_reg);</pre>
     end two_seg_arch;
RTL Hardware Design
                           Chapter 9
                                                           79
by P. Chu
```

## Misuse of derived clock

- Subsystems may run at different clock rate
- Poor design: use a derived slow clock for slow subsystem



#### Problem

- Multiple clock distribution network
- How about timing analysis? (maximal clock rate)

RTL Hardware Design by P. Chu

Chapter 9

81

#### • Better use a synchronous one-clock enable pulse



### • E.g., second and minutes counter

– Input: 1 MHz clock

- Poor design:



(a) Design with derived clock

RTL Hardware Design by P. Chu

Chapter 9

83

#### - Better design



(b) Design with a single synchronous clock

#### VHDL code of poor design

```
library ieee;
  use ieee.std_logic_1164.all;
  use ieee.numeric_std.all;
  entity timer is
     port (
        clk, reset: in std_logic;
        sec, min: out std_logic_vector(5 downto 0)
  end timer;
  architecture multi_clock_arch of timer is
     signal r_reg: unsigned(19 downto 0);
     signal r_next: unsigned(19 downto 0);
     signal s_reg, m_reg: unsigned(5 downto 0);
     signal s_next, m_next: unsigned(5 downto 0);
     signal sclk, mclk: std_logic;
  begin
RTL Hardware Design
                        Chapter 9
                                                    85
by P. Chu
```

```
-- register
                                               mod-60
process (clk, reset)
                                      mod-1M
                                                       mod-60
                                               counter
                                                       counter
                                      counter
begin
    if (reset='1') then
                                      (a) Design with derived clock
       r_reg <= (others=>,0,);
    elsif (clk'event and clk='1') then
       r_reg <= r_next;
    end if;
end process;
-- next state logic
r_next \ll (others=>,0) when r_reg=99999 else
            r_reg + 1;
-- output logic
sclk <= '0' when r_reg < 500000 else
          11:
```

```
-- second divider
                                    mod-1M
                                            mod-60
                                                     mod-60
                                    counter
                                            counter
                                                     counter
process (sclk, reset)
begin
                                    (a) Design with derived clock
    if (reset='1') then
       s_reg <= (others=>'0');
    elsif (sclk'event and sclk='1') then
       s_reg <= s_next;
   end if:
end process;
-- next state logic
s_next <= (others=>'0') when s_reg=59 else
            s_{reg} + 1;
-- output logic
mclk <= '0' when s_reg < 30 else
sec <= std_logic_vector(s_reg);
```

Chapter 9

```
mod-1M
                                                mod-60
                                                         mod-60
   -- minute divider
                                       counter
                                                counter
                                                         counter
   process (mclk, reset) ck
   begin
                                       (a) Design with derived clock
       if (reset='1') then
           m_reg <= (others=>'0');
       elsif (mclk'event and mclk='1') then
           m_reg <= m_next;</pre>
       end if;
   end process;
   -- next state logic
   m_next <= (others=>'0') when m_reg=59 else
               m_{reg} + 1;
   -- output logic
   min <= std_logic_vector(m_reg);
end multi_clock_arch;
```

#### Remedy: use a synchronous 1-clock pulse

```
architecture single_clock_arch of timer is
         signal r_reg: unsigned(3 downto 0);
         signal r_next: unsigned(3 downto 0);
         signal s_reg, m_reg: unsigned(5 downto 0);
         signal s_next, m_next: unsigned(5 downto 0);
         signal s_en, m_en: std_logic;
       begin
          -- register
          process (clk, reset)
          begin
              if (reset='1') then
                 r_reg <= (others=>'0');
                 s_reg <= (others=>'0');
                 m_reg <= (others=>,0);
              elsif (clk'event and clk='1') then
                 r_reg <= r_next;
                                                        s en
                 s_reg <= s_next;
                                                    mod-1M
                                                             mod-60
                                                                       mod-60
                 m_reg <= m_next;</pre>
                                                    counter
                                                             counter
                                                                       counter
             end if;
          end process;
                                                  (b) Design with a single synchronous clock
RTL Hardware Design
                                                                     89
                               Chapter 9
by P. Chu
```

```
— next state logic/output logic for mod-1000000 counter
   r_next \le (others=>,0) when r_reg=999999 else
              r_reg + 1;
   s_{en} \le 1' when r_{reg} = 500000 else
            ,0,:
   — ext state logic/output logic for second divider
   s_next <= (others=>'0') when (s_reg=59 and s_en='1') else
                              when s_en='1' else
              s_reg + 1
              s_reg;
   m_{en} \le 1' when s_{eg} = 30 and s_{en} = 1' else
            ,0,:
   -- next state logic for minute divider
   m_next \le (others=>'0') when (m_reg=59 \text{ and } m_en='1') else
                              when m_en='1' else
              m_reg + 1
              m_reg;
   -- output logic
   sec <= std_logic_vector(s_reg);</pre>
                                                 mod-1M
                                                          mod-60
                                                                   mod-60
   min <= std_logic_vector(m_reg);
                                                  counter
                                                                   counter
end single_clock_arch;
```

Chapter 9

RTL Hardware Design

by P. Chu

(b) Design with a single synchronous clock

## A word about power

- Power is a major design criteria now
- In CMOS technology
  - Dynamic power is proportional to the switching frequency of transistors
  - High clock rate implies high switching freq
- Clock manipulation
  - Can reduce switching frequency
  - But should not be done at RT level

RTL Hardware Design by P. Chu

Chapter 9

- Development flow:
  - Design/synthesize/verify a regular synchronous subsystems
  - 2(a). Derived clock: use special circuit (PLL etc.) to obtain derived clocks
  - 2(b). Gated clock: use "power optimization" software tool to convert some register into gated clock

## 8. Clock rate and Performance of FSMD

- Maximal clock rate
  - More difficult to analyze because of two interactive loops
  - The boundary of the clock rate can be found

RTL Hardware Design by P. Chu

Chapter 11

93

## Basic Block Diagram of FSMD



control path

RTL Hardware Design by P. Chu

Chapter 11

- Best-case scenario:
  - Control signals needed at late stage
  - Status signal available at early stage



Chapter 11

95



$$T_c = T_{cq(data)} + T_{dp} + T_{setup(data)}$$

RTL Hardware by P. Chu

- Worst-case scenario:
  - Control signals needed at early stage
  - Status signal available at late stage



Chapter 11



$$T_c = T_{cq(state)} + T_{output} + T_{dp} + T_{next} + T_{setup(state)}$$

$$T_{cq} + T_{dp} + T_{setup} \le T_c \le T_{cq} + T_{output} + T_{dp} + T_{next} + T_{setup}$$

$$\frac{1}{T_{cq} + T_{output} + T_{dp} + T_{next} + T_{setup}} \le f \le \frac{1}{T_{cq} + T_{dp} + T_{setup}}$$

Chapter 11

- Performance of FSMD
  - Tc: Clock period
  - K: # clock cycles to compete the computation
  - Total time = K \* Tc
- K determined by algorithm, input patterns etc.

- 8-bit input
  - Best: b=0, K=2
  - Worst: b=255, K=257
- N-bit input:
  - Worst:

$$K = 2+(2^n-1)$$

RTL Hardware Design by P. Chu

- 8-bit input
  - Best: b=0, K=2
  - Worst: b=255,K=2 + 255\*2
- N-bit input:
  - Worst:K=2+2\*(2<sup>n</sup>-1)



## Clock and Synchronization

RTL Hardware Design by P. Chu

Chapter 16

103

## **Outline**

- 1. Why synchronous?
- 2. Clock distribution network and skew
- 3. Multiple-clock system
- 4. Meta-stability and synchronization failure
- 5. Synchronizer
- 6. Enable tick crossing clock domain
- 7. Handshaking
- 8. Data transfer crossing clock domains

## 1. Why synchronous

RTL Hardware Design by P. Chu

Chapter 16

105

### Timing of a combinational digital system

- Steady state
  - Signal reaches a stable value
  - Modeled by Boolean algebra
- Transient period
  - Signal may fluctuate
  - No simple model
- Propagation delay: time to reach the steady state

## **Timing Hazards**

- Hazards: the fluctuation occurring during the transient period
  - Static hazard: glitch when the signal should be stable
  - Dynamic hazard: a glitch in transition
- Due to the multiple converging paths of an output port

RTL Hardware Design by P. Chu

Chapter 16

107

E.g., static-hazard (sh=ab'+bc; a=c=1)





(a) Karnaugh map and schematic



RTL Harris

(b) Timing diagram

#### • E.g., dynamic hazard (a=c=1, d=0)



E.g., Hazard of circuit with closed feedback loop (async seq circuit)



(a) D latch

q\*

q

d

```
library ieee;
use ieee.std_logic_1164.all;
entity dlatch is
   port (
      c: in std_logic;
      d: in std_logic;
      q: out std_logic
   );
end dlatch;
architecture demo_arch of dlatch i:
   signal q_latch: std_logic;
begin
   process (c, d, q_latch)
   begin
      if (c='1') then
         q_latch <= d;
      else
         q_latch <= q_latch;
      end if;
   end process;
   q <= q_latch;
end demo_arch;
```

RTL Hardware Design by P. Chu



## Dealing with hazards

 In a small number of cases, additional logic can be added to eliminate race (and hazards).



(c) Revised Karnaugh map and schematic to eliminate hazards

- This is not feasible for synthesis
- What's can go wrong:
  - During logic synthesis, the logic expressions will be rearranged and optimized.
  - During technology mapping, generic gates will be re-mapped
  - During placement & routing, wire delays may change
  - It is bad for testing verification

Chapter 16

- Better way to handle hazards
  - Ignore glitches in the transient period and retrieve the data after the signal is stabilized
- In a sequential circuit
  - Use a clock signal to sample the signal and store the stable value in a register.
  - But register introduces new timing constraint (setup time and hold time)



- Synchronous system:
  - group registers into a single group and drive them with the same clock
  - Timing analysis for a single feedback loop



## Synchronous circuit and EDA

- Synthesis: reduce to combinational circuit synthesis
- Timing analysis: involve only a single closed feedback loop (others reduce to combinational circuit analysis)
- Simulation: support "cycle-based simulation"
- Testing: can facilitate scan-chain

# 2. Clock distribution network and skew

RTL Hardware Design by P. Chu

Chapter 16

117

## Clock distribution network

- Ideal clock: clock's rising edges arrive at FFs at the same time
- Real implementation:
  - Driving capability of each cell is limited
  - Need a network of buffers to drive all FFs
  - In ASIC: done by clock synthesis (a step in physical synthesis)
  - In FPGA: pre-fabricated clock distribution network

Block diagram

Ideal H-routing





RTL Hardware Design by P. Chu

Chapter 16

119

# Clock skew (时钟偏移)

 Skew: time difference between two arriving clock edges



RTL Hardware Design by P. Chu



Chapter 16

# Timing analysis

- Setup time constraint (impact on max clock rate)
- Hold time constraint

by P. Chu



121

 $t_5$  $t_3 < t_4$ clk1 clk2  $t_3 = t_0 + T_{cq} + T_{next(max)}$ T<sub>skew</sub>  $t_4 = t_5 - T_{setup} = (t_0 + T_c + T_{skew}) - T_{setup}$ T<sub>skew</sub> r1\_reg T<sub>next(max)</sub> next-state r2\_next Tnext(min) invalid value (transient period) Thold

$$t_3 < t_4$$

$$t_3 = t_0 + T_{cq} + T_{next(max)}$$

$$t_4 = t_5 - T_{setup} = (t_0 + T_c + T_{skew}) - T_{setup}$$

$$T_{cq} + T_{next(max)} + T_{setup} - T_{skew} < T_c$$

$$T_{c(min)} = T_{cq} + T_{next(max)} + T_{setup} - T_{skew}$$

 Clock skew actually helps increasing clock rate in this particular case

RTL Hardware Design by P. Chu

Chapter 16

123

· If the clock signal travels from the opposite direction

$$T_{c(min)} = T_{cq} + T_{next(max)} + T_{setup} + T_{skew}$$

- · Normally we have to consider the worst case since
  - No control on clock routing during synthesis
  - Multiple feedback paths



RTL Hardware Design by P. Chu

#### Hold time constraint

$$t_h < t_2$$

$$t_2 = t_0 + T_{cq} + T_{next(min)}$$

$$t_h = t_0 + T_{hold} + T_{skew}$$

$$T_{hold} < T_{cq} + T_{next(min)} - T_{skew}$$

$$T_{hold} < T_{cq} - T_{skew}$$

- Skew may reduce hold time margin
- Hold time violation cannot be corrected in RT level

RTL Hardware Design by P. Chu

Chapter 16

125

#### Summary

- Clock skew normally has negative impact on synchronous sequential circuit
- Effect on setup time constraint: require to increase clock period (i.e., reduce clock rate)
- Effect on hold time constraint: may introduce hold time violation
  - Can only be fixed during physical synthesis: re-route clock; re-place register and comb logic; add artificial delay logic
- Skew within 10% of clock period tolerable

## 3. Multiple-clock system

RTL Hardware Design by P. Chu

Chapter 16

127

## Why multiple clocks

- Inherent multiple clock sources
  - E.g., external communication link
- Circuit size
  - Clock skew increases with the # FFs in a system
  - Current technology can support up to 10<sup>4</sup> FFs
- Design complexity
  - E.g., as sysetm w/ 16-bit 20 MHz processor, 1-bit 100 MHz serial interface, 1 MHz I/O controller
- Power consideration
  - Dynamic power proportional to switching freq

## Derived vs Independent clocks

- Independent clocks:
  - Relationship between the clocks is unknown
- Derived clocks:
  - A clock is derived from another clock signals (e.g., different clock rate or phase)
  - Relationship is known
  - Logic for the derived clock should be separated from regular logic and manually synthesized (e.g., special delay line or PLL)
  - A system with derived clock can still be treated and analyzed as a synchronous system

RTL Hardware Design by P. Chu

Chapter 16

129

## Poorly conceived clock divider





RTL Hardware Design

Chapter 6

### **GALS**

- Globally asynchronous locally synchronous system
  - Partition a system into multiple clock domains
  - Design and verify subsystem in same clock domain as a synchronous system
  - Design special interface between clock domains

RTL Hardware Design by P. Chu

Chapter 16

131

# 4. Meta-stability and synchronization failure

#### Timing analysis of a synchronous system

- To satisfy setup time constraint:
  - Signal from the state register
    - Controlled by clock
    - Adjust clock period to <u>avoid</u> setup time violation
  - Signal from external input
    - Same if the external input comes from another synchronous subsystem
    - Otherwise, have to <u>deal with the occurrence</u> of setup time violation.

RTL Hardware Design by P. Chu

Chapter 16



## Metastability

What happens after timing violation?



- Output of FF becomes 1 (sampled old input value)
- Output of FF becomes 0 (sampled new input value)
- FF enters metastable state, the output exhibits an "in-between" value
  - FF eventually "resolves" to one of stable states
  - The resolution time is a random variable with distribution function ( $\tau$  is decay constant)

$$P(T_r) = e^{-\frac{T_r}{\tau}}$$

• The probability that metastability persists beyond  $T_r$  (i.e., cannot be resolved within  $T_r$ )

by P. Chu

# $MTBF(T_r)$

- Synchronization failure
  - an FF cannot resolve the metastable condition within the given time

#### MTBF

- Mean Time Between synchronization Failures
- Basic criterion for metastability analysis
- Frequently expressed as a function of  $T_r$  (resolution time provided)

RTL Hardware Design by P. Chu

Chapter 16

137

#### MTBF computation

- $R_{meta}$ : The average rate at which an FF enters the metastable state.
- $P(T_r)$ : The probability that an FF cannot resolve the metastable condition within  $T_r$ .

$$R_{meta} = w * f_{clk} * f_d$$

w is the susceptible time window.

$$P(T_r) = e^{-\frac{T_r}{\tau}}$$

$$AF(T_r) = R_{meta} * P(T_r) = w * f_{clk} * f_d * e^{-\frac{T_r}{\tau}}$$

$$MTBF(T_r) = \frac{1}{AF(T_r)} = \frac{e^{\frac{T_r}{\tau}}}{w * f_{clk} * f_d}$$

RTL Hardware Design by P. Chu

Chapter 16

• E.g., w=0.1ns,  $\tau$ =0.5ns,  $f_{clk}$ =50MHz,  $f_{d}$ =0.1 $f_{clk}$ 

| $T_r$   | MTBF                                             |
|---------|--------------------------------------------------|
| 0.0 ns  | $4.00 * 10^{-05} sec (0.04 msec)$                |
| 2.5 ns  | $5.94 * 10^{-03} $ sec (5.94 msec)               |
| 5.0 ns  | $8.81 * 10^{-01} $ sec (0.88 sec)                |
| 7.5 ns  | $1.31 * 10^{+02} $ sec (131 sec)                 |
| 10.0 ns | $1.94 * 10^{+04}$ sec (5.39 hours)               |
| 12.5 ns | $2.88 * 10^{+06}$ sec (3.33 days)                |
| 15.0 ns | $4.27 * 10^{+08}$ sec (1.36 years)               |
| 17.5 ns | $6.34 * 10^{+10} $ sec $(2.01 * 10^3 $ years)    |
| 20.0 ns | $9.42 * 10^{+12} $ sec $(2.99 * 10^5 $ years)    |
| 22.5 ns | $1.40 * 10^{+15} $ sec $(4.43 * 10^7 $ years)    |
| 25.0 ns | $2.07 * 10^{+17} $ sec $(6.58 * 10^9 $ years)    |
| 27.5 ns | $3.08 * 10^{+19} $ sec $(9.76 * 10^{11} $ years) |
| 30.0 ns | $4.57 * 10^{+21} $ sec $(1.45 * 10^{14} $ years) |
| 32.5 ns | $6.78 * 10^{+23} $ sec $(2.15 * 10^{16} $ years) |
| 35.0 ns | $1.01 * 10^{+26} $ sec $(3.19 * 10^{18} $ years) |

RTL Hardware Design by P. Chu

Chapter 16

139

#### Observations

- MTBF is statistical average
- Only  $T_r$  can be adjusted in practical design
- MTBF is extremely sensitive to  $T_r$ 
  - Good: synchronization failure can be easily avoided by providing additional resolution time
  - Bad: minor modification can introduce synchronization failure

## 5. Synchronizer

RTL Hardware Design by P. Chu

Chapter 16

- Synchronization circuit:
  - Synchronize an asynchronous input with system clock
  - No physical circuit can prevent metastability
  - Synchronizer just provides enough time for the metastable condition to be "resolved"
- E.g.,
  - w = 0.1ns,  $\tau = 0.5$ ns,  $f_{c/k} = 50$ MHz,  $f_{d} = 0.1 f_{c/k}$
  - $-T_{\text{setup}} = 2.5$ s



# No synchronizer

- $T_r = 0$
- MTBF(0) = 0.04 ms

## One-FF synchronizer

• 
$$T_r = T_c - (T_{comb} + T_{setup})$$

- $T_r$  depends on  $T_c$ ,  $T_{setup}$  and  $T_{comb}$ 
  - $-T_c$ : vary with system specification
  - T<sub>comb</sub>: vary with circuit, synthesis (gate delay), placement & routing (wire delay)
- E.g.,

$$-T_r = 20 - (T_{comb} + 2.5) = 17.5 - T_{comb}$$

- $-T_{comb}$  = 1ns,  $T_r$  = 16.5ns; MTBF(16.5) = 272yr
- $-T_{comb}$  = 12.5ns,  $T_r$  = 5ns; MTBF(5) = 0.88ns
- Not a reliable design

RTL Hardware Design by P. Chu

Chapter 16

145

### Two-FF synchronizer

- Add an extra FF to eliminate T<sub>comb</sub>
  - $-T_r = T_c T_{setup}$
  - $-T_r$  depends on  $T_c$  only
  - Async input delayed by two clock cycles
- E.g.,

$$-T_r$$
 = 20 - 2.5 = 17.5; MTBF(17.5) = 3,000yr

- Most commonly used synchronizer
- In ASIC technology
  - May have "metastability-hardened" D FF cell (large area)

# Three-FF synchronizer

 Add an extra stage to increase resolution time

$$-T_r = 2(T_c - T_{\text{setup}})$$

- Async input delayed by three clock cycles
- E.g.,

$$-T_r = 2*(20 - 2.5)$$
; MTBF(30)=6 billion yr

Hardly needed

RTL Hardware Design by P. Chu

Chapter 16

147

#### Observation

- T<sub>r</sub> is in the exponent of MTBF equation
- Small variation in  $T_r$  can lead to large swing in MTBF

## Proper use of synchronizer

- Use a glitch-free signal for synchronization
- Synchronize a signal in a single place
- Avoid synchronization multiple "related" signals.
- Reanalyze the synchronizer after each design change

RTL Hardware Design by P. Chu

Chapter 16







# Why synchronization is a "tricky" issue

- Metastability is basically an "analog" phenomena
- Metastability behavior is described by random variable
- Metastability cannot be easily modeled or simulated in gate level (only 'X')
- Metastability cannot be easily observed or measured in physical circuit (e.g., MTBF = 3 months)
- MTBF is very sensitive to circuit revision

RTL Hardware Design by P. Chu

Chapter 16

151

# Enable tick crossing clock domains

# Signals crossing clock domains

#### Synchronizer

- Just ensures that the receiving system does not enter a metastable state
- Not guarantee the "function" of the received signal
- Consideration
  - One signal
  - Multiple signals ("bundled data")

RTL Hardware Design by P. Chu

Chapter 16

153

# Domain-crossing of an enable signal

- An enable tick
  - One-clock-cycle wide
  - To be sample in a single clock edge
  - E.g., enable input of a counter; read/write signal of a FIFO buffer
  - Can also be used to retrieve bundled data

# "Wide" enable signal

 From a slow clock domain to a fast clock domain (e.g., 1 MHz to 10 MHz)



#### • Will this work?



# "Narrow" enable signal

- From a fast clock domain to a slow clock domain (e.g., 10 MHz to 1 MHz)
- The enable pulse is probably too narrow to be detected
- Need to "stretch" the pulse
  - Cannot be done by a normal sequential circuit
  - Need to use "tricks"

RTL Hardware Design by P. Chu

Chapter 16



- en\_q asserted at the rising edge of en\_in
- en\_q then synchronized
- en\_strobe then clears stretcher
- en\_q may last over two clock cycles and thus an edge-detector is needed
- Can this scheme be used for wide-pulse?

## Level-alternating scheme

- Output interface of sender and input interface of receiver modified for domain crossing
- Output interface converts an "edge-sensitive" enable pulse to a level-alternating signal
  - Use a T-FF
- Input interface converts the level-alternating signal back to "edge-sensitive" enable pulse
  - Use a dual-edge detector
- Eliminate the ad-hoc stretcher and follow the synchronous design methodology

RTL Hardware Design by P. Chu

Chapter 16

159



(a) Block diagram of the sending subsystem



(b) Block diagram of the receiving subsystem



RTL Hardwa by P. Chu

(c) Simplified timing diagram

## 7. Handshaking

RTL Hardware Design by P. Chu

Chapter 16

- How to control the rate of data (or number of enable ticks) between two clock domains? (e.g., 10 MHz system to 1 MHz system)
- Does the sending system have prior knowledge about the processing speed of receiving system?
- Handshaking scheme
  - Use a feedback signal
  - Make minimal assumption about the receiving system





RTL Hardware Design Chapter 16
by P. Chu



#### Four phases:

- Phase 1: talker activates req
- Phase 2: listener activates ack
- Phase 3: talker de-activates req
- Phase 4: listener de-activates ack
- Talker can start a new request

#### Need synchronizer if talker listener in different clock domains



RTL Hardware Design by P. Chu

Chapter 16

165

#### Talker FSM and listener FSM



166

Talker FSM

Listener FSM

- Implementation:
  - Talker: FSM and synchronizer for ack out
  - Listener: FSM and synchronizer for req\_out
- Pass an enable tick using handshaking
  - The enable tick functions as the start signal in talker
  - The listener generates a Mealy output which is asserted when req\_sync is asserted in the s\_ack0 state (i.e., a rising-edge detection circuit for req\_sync)

RTL Hardware Design by P. Chu

Chapter 16

167

Can we remove the second part of handshaking?





(b) Error due to slow talker response



(c) Error due to slow listener response

RTL Hardware Design by P. Chu

Chapter 16

- Two-phase handshaking protocol
  - We can modify the 4-phase protocol so that talker/listener not returning to 0
  - May not be proper for certain applications



RTL Hardware Design by P. Chu

Chapter 16

169

## Data transfer crossing clock domains

- It is difficult to synchronize a multiple-bit signal (e.g., signal changes from 11 to 00)
- Use req/ack and handshaking protocol to coordinate data transfer
  - Only one signal needs to be synchronized in each domain
  - All other signals are bundled as "data"

RTL Hardware Design by P. Chu

Chapter 16

171

- Push operation (talker sending data)
  - Conceptual diagrams





RTL Hardware by P. Chu

- More detailed diagram
  - Talker activates req\_out and tri\_en (i.e., placing data on data bus) at the same time.
  - req\_out is delayed one or two clocks when synchronized in listener
  - data is stabilized when data\_en is asserted (i.e., no timing violation)



#### Pull operation (taller retrieving data)

#### Conceptual diagrams



#### · Bidirectional operation is possible; e.g.,



175

#### • Performance:

- How many clock cycle for one data transfer?
- Other methods for data transfer
  - FIFO (synchronization needed for empty and full status signal)
  - Shared memory (synchronization needed for arbitration circuit)
  - Dual-port memory (meta-stable condition may occur in the internal arbitration circuit)



### Input-related timing consideration

